AITopics | open-source project

Collaborating Authors

open-source project

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OpenAI Launches Full-Scale Effort to Patch Open-Source Bugs as It Takes on Anthropic's Mythos

WIREDJun-22-2026, 17:00:00 GMT

OpenAI Launches Full-Scale Effort to Patch Open-Source Bugs as It Takes on Anthropic's Mythos Amid concerns about AI models' cybersecurity capabilities, OpenAI revealed an improved version of GPT-5.5-Cyber and its "Patch the Planet" initiative to fix open-source software bugs. As fears about AI hacking capabilities grow, OpenAI on Monday made a slew of cybersecurity-focused announcements, including an improved version of its limited-access security-specialized model GPT-5.5-Cyber, As advances across the AI industry leave critical open-source projects at increasing risk of falling behind, though, the company also said on Monday that it is launching an effort known as Patch the Planet, founded with the prominent research-focused security firm Trail of Bits and in collaboration with vulnerability management firms HackerOne and Calif. The project has already begun its work offering free security consulting services to open source maintainers to not only help them find and patch vulnerabilities, but also support them in strengthening their code bases and incorporating AI security tools into their development process. The idea is to give individualized support to as many open-source projects as possible to improve both their current security and long-term resilience in a way that will actually be sustainable.

large language model, machine learning, natural language, (19 more...)

WIRED

Country: North America > United States (1.00)

Industry:

Retail (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.87)

Add feedback

Automated Duplicate Bug Report Detection in Large Open Bug Repositories

Laney, Clare E., Barovic, Andrew, Moin, Armin

arXiv.org Artificial IntelligenceDec-3-2025

Many users and contributors of large open-source projects report software defects or enhancement requests (known as bug reports) to the issue-tracking systems. However, they sometimes report issues that have already been reported. First, they may not have time to do sufficient research on existing bug reports. Second, they may not possess the right expertise in that specific area to realize that an existing bug report is essentially elaborating on the same matter, perhaps with a different wording. In this paper, we propose a novel approach based on machine learning methods that can automatically detect duplicate bug reports in an open bug repository based on the textual data in the reports. We present six alternative methods: Topic modeling, Gaussian Naive Bayes, deep learning, time-based organization, clustering, and summarization using a generative pre-trained transformer large language model. Additionally, we introduce a novel threshold-based approach for duplicate identification, in contrast to the conventional top-k selection method that has been widely used in the literature. Our approach demonstrates promising results across all the proposed methods, achieving accuracy rates ranging from the high 70%'s to the low 90%'s. We evaluated our methods on a public dataset of issues belonging to an Eclipse open-source project.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/COMPSAC65507.2025.00065

2504.14797

Country: North America > United States > Colorado (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

QLPro: Automated Code Vulnerability Discovery via LLM and Static Code Analysis Integration

Hu, Junze, Jin, Xiangyu, Zeng, Yizhe, Liu, Yuling, Li, Yunpeng, Du, Dan, Xie, Kaiyu, Zhu, Hongsong

arXiv.org Artificial IntelligenceJul-22-2025

-- Code auditing, a method where security researchers review source code to identify vulnerabilities, has become increasingly impractical for large-scale open-source projects. While Large Language Models (LLMs) demonstrate impressive code generation capabilities, they are constrained by limitations in context window size, memory capacity, and complex reasoning abilities, making direct vulnerability detection across entire projects infeasible. Static code analysis tools, though effective to a degree, are heavily reliant on their predefined scanning rules. T o address these challenges, we present QLPro, a vulnerability detection framework that systematically integrates LLMs with static code analysis tools. QLPro introduces both a triple-voting mechanism and a three-role mechanism to enable fully automated vulnerability detection across entire open-source projects without human intervention. Specifically, QLPro first utilizes static analysis tools to extract all taint specifications from a project, then employs LLMs and the triple-voting mechanism to classify and match these taint specifications, thereby enhancing both the accuracy and appropriateness of taint specification classification.

large language model, natural language, vulnerability, (17 more...)

arXiv.org Artificial Intelligence

2506.23644

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.69)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

The Evolution of Darija Open Dataset: Introducing Version 2

Outchakoucht, Aissam, Es-Samaali, Hamza

arXiv.org Artificial IntelligenceMay-14-2024

Darija Open Dataset (DODa) represents an open-source project aimed at enhancing Natural Language Processing capabilities for the Moroccan dialect, Darija. With approximately 100,000 entries, DODa stands as the largest collaborative project of its kind for Darija-English translation. The dataset features semantic and syntactic categorizations, variations in spelling, verb conjugations across multiple tenses, as well as tens of thousands of translated sentences. The dataset includes entries written in both Latin and Arabic alphabets, reflecting the linguistic variations and preferences found in different sources and applications. The availability of such dataset is critical for developing applications that can accurately understand and generate Darija, thus supporting the linguistic needs of the Moroccan community and potentially extending to similar dialects in neighboring regions. This paper explores the strategic importance of DODa, its current achievements, and the envisioned future enhancements that will continue to promote its use and expansion in the global NLP landscape.

application, dataset, doda, (14 more...)

arXiv.org Artificial Intelligence

2405.13016

Country: Africa > Middle East > Morocco (0.05)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Understanding the Helpfulness of Stale Bot for Pull-based Development: An Empirical Study of 20 Large Open-Source Projects

Khatoonabadi, SayedHassan, Costa, Diego Elias, Mujahid, Suhaib, Shihab, Emad

arXiv.org Artificial IntelligenceMay-29-2023

Pull Requests (PRs) that are neither progressed nor resolved clutter the list of PRs, making it difficult for the maintainers to manage and prioritize unresolved PRs. To automatically track, follow up, and close such inactive PRs, Stale bot was introduced by GitHub. Despite its increasing adoption, there are ongoing debates on whether using Stale bot alleviates or exacerbates the problem of inactive PRs. To better understand if and how Stale bot helps projects in their pull-based development workflow, we perform an empirical study of 20 large and popular open-source projects. We find that Stale bot can help deal with a backlog of unresolved PRs as the projects closed more PRs within the first few months of adoption. Moreover, Stale bot can help improve the efficiency of the PR review process as the projects reviewed PRs that ended up merged and resolved PRs that ended up closed faster after the adoption. However, Stale bot can also negatively affect the contributors as the projects experienced a considerable decrease in their number of active contributors after the adoption. Therefore, relying solely on Stale bot to deal with inactive PRs may lead to decreased community engagement and an increased probability of contributor abandonment.

adoption, contributor, stale bot, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3624739

2305.1815

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > China (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Software (1.00)
Information Technology > Human Computer Interaction (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations

Tony, Catherine, Mutas, Markus, Ferreyra, Nicolás E. Díaz, Scandariato, Riccardo

arXiv.org Artificial IntelligenceMar-16-2023

Large Language Models (LLMs) like Codex are powerful tools for performing code completion and code generation tasks as they are trained on billions of lines of code from publicly available sources. Moreover, these models are capable of generating code snippets from Natural Language (NL) descriptions by learning languages and programming practices from public GitHub repositories. Although LLMs promise an effortless NL-driven deployment of software applications, the security of the code they generate has not been extensively investigated nor documented. In this work, we present LLMSecEval, a dataset containing 150 NL prompts that can be leveraged for assessing the security performance of such models. Such prompts are NL descriptions of code snippets prone to various security vulnerabilities listed in MITRE's Top 25 Common Weakness Enumeration (CWE) ranking. Each prompt in our dataset comes with a secure implementation example to facilitate comparative evaluations against code produced by LLMs. As a practical application, we show how LLMSecEval can be used for evaluating the security of snippets automatically generated from NL descriptions.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2303.09384

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.90)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Automatically Identifying Relations Between Self-Admitted Technical Debt Across Different Sources

Li, Yikun, Soliman, Mohamed, Avgeriou, Paris

arXiv.org Artificial IntelligenceMar-13-2023

Self-Admitted Technical Debt or SATD can be found in various sources, such as source code comments, commit messages, issue tracking systems, and pull requests. Previous research has established the existence of relations between SATD items in different sources; such relations can be useful for investigating and improving SATD management. However, there is currently a lack of approaches for automatically detecting these SATD relations. To address this, we proposed and evaluated approaches for automatically identifying SATD relations across different sources. Our findings show that our approach outperforms baseline approaches by a large margin, achieving an average F1-score of 0.829 in identifying relations between SATD items. Moreover, we explored the characteristics of SATD relations in 103 open-source projects and describe nine major cases in which related SATD is documented in a second source, and give a quantitative overview of 26 kinds of relations.

machine learning, natural language, satd item, (20 more...)

arXiv.org Artificial Intelligence

2303.07079

Country:

South America > Uruguay > Maldonado > Maldonado (0.05)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
Europe > Netherlands (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Data Science (0.93)

Add feedback

best way to be a machine learning engineer

#artificialintelligenceMar-10-2023, 21:25:16 GMT

Becoming a machine learning engineer requires a combination of skills and knowledge in various areas such as mathematics, programming, data analysis, and machine learning algorithms. Learn the basics of mathematics and statistics: Machine learning requires a strong foundation in mathematics and statistics. You should be familiar with calculus, linear algebra, probability, and statistics. Master a programming language: You should learn a programming language such as Python or R, which are commonly used for machine learning. You should also be familiar with data structures, algorithms, and object-oriented programming.

engineer, machine learning, skill and knowledge, (9 more...)

#artificialintelligence

Genre: Instructional Material (0.37)

Industry: Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.57)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.53)

Add feedback

Behavior Trees and State Machines in Robotics Applications

Ghzouli, Razan, Berger, Thorsten, Johnsen, Einar Broch, Wasowski, Andrzej, Dragule, Swaib

arXiv.org Artificial IntelligenceMar-6-2023

Autonomous robots combine skills to form increasingly complex behaviors, called missions. While skills are often programmed at a relatively low abstraction level, their coordination is architecturally separated and often expressed in higher-level languages or frameworks. State machines have been the go-to language to model behavior for decades, but recently, behavior trees have gained attention among roboticists. Although several implementations of behavior trees are in use, little is known about their usage and scope in the real world.How do concepts offered by behavior trees relate to traditional languages, such as state machines? How are concepts in behavior trees and state machines used in actual applications? This paper is a study of the key language concepts in behavior trees as realized in domain-specific languages (DSLs), internal and external DSLs offered as libraries, and their use in open-source robotic applications supported by the Robot Operating System (ROS). We analyze behavior-tree DSLs and compare them to the standard language for behavior models in robotics:state machines. We identify DSLs for both behavior-modeling languages, and we analyze five in-depth.We mine open-source repositories for robotic applications that use the analyzed DSLs and analyze their usage. We identify similarities between behavior trees and state machines in terms of language design and the concepts offered to accommodate the needs of the robotics domain. We observed that the usage of behavior-tree DSLs in open-source projects is increasing rapidly. We observed similar usage patterns at model structure and at code reuse in the behavior-tree and state-machine models within the mined open-source projects. We contribute all extracted models as a dataset, hoping to inspire the community to use and further develop behavior trees, associated tools, and analysis techniques.

artificial intelligence, behavior tree, dsl, (17 more...)

arXiv.org Artificial Intelligence

2208.04211

Country:

Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
(11 more...)

Genre:

Research Report (0.64)
Instructional Material (0.46)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.67)

Add feedback

Infrastructure in '23

#artificialintelligenceFeb-27-2023, 20:31:11 GMT

For the third year running, I set aside some time at the beginning of the year to share what I believe to be the most dynamic and important areas of innovation in infrastructure. If you share my interest in any one or more of these areas, I would love to hear from you. The future of cloud is here, and it's Javascript While I have written previously about the rise of serverless computing, I was slow to appreciate the role Javascript would play in pushing it forward. Javascript is the only language that lives up to "write once, run anywhere." It has the most vibrant ecosystem of any language on the planet, unmatched startup times, and is secure enough to run untrusted code on behalf of users without modification or special tooling.

infrastructure, javascript, platform, (15 more...)

#artificialintelligence

Industry: Information Technology (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Software (0.97)

Add feedback